Microdata Disclosure by Resampling – Empirical Findings for Business Survey Data
نویسنده
چکیده
A problem statistical offices and research institutes are faced with by releasing micro-data is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of data, i.e., information loss is potentially high. In this paper I discuss an alternative technique of creating scientific-use-files, which reproduce the characteristics of the original data quite well. It is based on Fienberg’s (1997 and 1994) [5], [6] idea to estimate and resample from the empirical multivariate cumulative distribution function of the data in order to get synthetic data. The procedure creates datasets the resample which have the same characteristics as the original survey data. In this paper I present some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and compare resampling with a common method of disclosure control, i.e. disturbance with multiplicative error, concerning confidentiality on the one hand and the appropriateness of the disturbed data for different kinds of analyses on the other. The results show that univariate distributions can be better reproduced by unweighted resampling. Parameter estimates can be reproduced quite well if (a) the resampling procedure implements the correlation structure of the original data as a scale and (b) the data is multiplicative perturbed and a correction term is used. On average, anonymized data with multiplicative perturbed values better protect against re–identification as the various resampling methods used.
منابع مشابه
STATISTICAL COMMISSION and COMMISSION OF THE ECONOMIC COMMISSION FOR EUROPE EUROPEAN COMMUNITIES CONFERENCE OF EUROPEAN STATISTICIANS EUROSTAT
Abstract: We present in this paper the first empirical comparison of SDC methods for continuous microdata. Based on re-identification experiments, we try to optimize the tradeoff between information loss and disclosure risk. SDC methods compared include additive noise, distortion by probability distribution, microaggregation, resampling, rank swapping and the novel approach based on lossy compr...
متن کاملSeeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes
(2001). Seeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes. Public concern about personal privacy has recently fo-cused on issues of Internet data security and personal information as big business. The scientific discourse about information privacy focuses on the crosspres-sures of maintaining conf...
متن کاملRemote Data Access and the Risk of Disclosure from Linear Regression: An Empirical Study
In the endeavor of finding ways for easy data access for external researchers remote data access seems to be an attractive alternative to the current standard of data perturbation or restricted access only at designated data archives or research data centers. However, even if the microdata are not available directly, disclosure of sensitive information is still possible. We illustrate that an i...
متن کاملA CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملDisclosure Risk from Factor Scores in a Remote Access Environment
Remote access is a promising tool for broadening the access to microdata without violating confidentiality requirements. In a remote access setting the user submits queries to a system provided by the statistical agency and only the results of the queries are reported back to the user. Since no direct access to the data is granted, generally no alteration of the underlying microdata is required...
متن کامل